Maryland Total Migration Analysis (FY2020-FY2024)¶

Executive Summary¶

This notebook provides exploratory data analysis of Maryland's net migration patterns across fiscal years 2020-2024. The analysis examines migration trends across all Maryland jurisdictions, identifying areas of population gain and loss.

Data Source: Maryland State Database
Database: Maryland
Table: Migration
Fiscal Year Period: July 1 - June 30
Data Type: Net migration (positive = in-migration, negative = out-migration)

1. Setup and Data Loading¶

In [13]:
# Import required libraries
import os
from dotenv import load_dotenv
import mssql_python as mssql
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
import warnings

warnings.filterwarnings("ignore", message="pandas only supports SQLAlchemy connectable")

# Load environment variables
load_dotenv()

# Configure visualization
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Configure Plotly for HTML export - include full plotly.js library
pio.renderers.default = "notebook"
import plotly.offline as pyo
pyo.init_notebook_mode(connected=False)

# Database connection
SQL_CONNECTION_STRING = os.getenv("SQL_CONNECTION_STRING")
TABLE_NAME = "[Maryland].[dbo].[Migration]"

print("✓ Libraries imported successfully")
print("✓ Environment configured")
✓ Libraries imported successfully
✓ Environment configured
In [14]:
# Load data from SQL Server
connection = mssql.connect(SQL_CONNECTION_STRING)
cursor = connection.cursor()

query = f"SELECT * FROM {TABLE_NAME} ORDER BY FiscalYear"
df = pd.read_sql(query, connection)

connection.close()

print(f"Data loaded: {len(df)} records")
print(f"Fiscal years: FY{df['FiscalYear'].min()} - FY{df['FiscalYear'].max()}")
df.head()
Data loaded: 5 records
Fiscal years: FY2020 - FY2024
Out[14]:
FiscalYear PeriodStart PeriodEnd Maryland Allegany AnneArundel BaltimoreCity Baltimore Calvert Caroline Carroll Cecil Charles Dorchester Frederick Garrett Harford Howard Kent Montgomery PrinceGeorgeS QueenAnneS Somerset StMaryS Talbot Washington Wicomico Worcester
0 2020 2020-04-01 2020-06-30 -4982 -58 141 -2161 -928 106 -22 79 81 19 10 913 -18 223 149 -13 -1732 -2058 127 -28 133 36 -27 -89 135
1 2021 2020-07-01 2021-06-30 -8725 209 784 -6001 -2605 1200 141 1362 350 1662 151 7118 131 2008 1050 311 -8998 -11763 879 92 229 501 693 507 1264
2 2022 2021-07-01 2022-06-30 1854 103 765 -5830 -2021 332 89 1527 1062 1280 177 6739 11 752 -145 250 -341 -6755 861 218 -57 382 858 741 856
3 2023 2022-07-01 2023-06-30 11010 500 -833 -3681 -842 58 198 1246 838 1646 418 4920 20 761 -268 176 3158 -776 850 406 74 319 1012 294 516
4 2024 2023-07-01 2024-06-30 34591 252 416 237 3376 -34 293 404 683 2080 282 4324 77 610 1267 188 8401 7059 1003 264 630 418 1438 481 442

2. Data Structure and Quality Assessment¶

In [15]:
# Data structure overview
print("=" * 80)
print("DATA STRUCTURE")
print("=" * 80)
print(f"\nDataset Shape: {df.shape[0]} rows × {df.shape[1]} columns")
print(f"\nColumn Overview:")
print(df.dtypes)

print("\n" + "=" * 80)
print("DATA QUALITY ASSESSMENT")
print("=" * 80)

# Check for missing values
missing_counts = df.isnull().sum()
if missing_counts.sum() == 0:
    print("\n✓ No missing values detected")
else:
    print("\nMissing values by column:")
    print(missing_counts[missing_counts > 0])

# Check for duplicates
duplicates = df.duplicated(subset=['FiscalYear']).sum()
print(f"\nDuplicate fiscal years: {duplicates}")

# Identify county columns (excluding FiscalYear and Period columns)
county_columns = [col for col in df.columns if col not in ['FiscalYear', 'PeriodStart', 'PeriodEnd']]
print(f"\n✓ {len(county_columns)} jurisdictions in dataset")
================================================================================
DATA STRUCTURE
================================================================================

Dataset Shape: 5 rows × 28 columns

Column Overview:
FiscalYear        int64
PeriodStart      object
PeriodEnd        object
Maryland          int64
Allegany          int64
AnneArundel       int64
BaltimoreCity     int64
Baltimore         int64
Calvert           int64
Caroline          int64
Carroll           int64
Cecil             int64
Charles           int64
Dorchester        int64
Frederick         int64
Garrett           int64
Harford           int64
Howard            int64
Kent              int64
Montgomery        int64
PrinceGeorgeS     int64
QueenAnneS        int64
Somerset          int64
StMaryS           int64
Talbot            int64
Washington        int64
Wicomico          int64
Worcester         int64
dtype: object

================================================================================
DATA QUALITY ASSESSMENT
================================================================================

✓ No missing values detected

Duplicate fiscal years: 0

✓ 25 jurisdictions in dataset

3. Statewide Migration Summary¶

In [16]:
# Statewide migration statistics
print("=" * 80)
print("MARYLAND STATEWIDE MIGRATION (FY2020-FY2024)")
print("=" * 80)

summary = df[['FiscalYear', 'Maryland']].copy()
summary.columns = ['Fiscal Year', 'Net Migration']
summary['Migration Type'] = summary['Net Migration'].apply(lambda x: 'In-Migration' if x > 0 else 'Out-Migration')
summary['Absolute Migration'] = summary['Net Migration'].abs()

print("\n", summary.to_string(index=False))

print("\n" + "=" * 80)
print("SUMMARY STATISTICS")
print("=" * 80)
print(f"Total 5-Year Net Migration: {summary['Net Migration'].sum():,}")
print(f"Average Annual Net Migration: {summary['Net Migration'].mean():,.0f}")
print(f"Years with Net In-Migration: {(summary['Net Migration'] > 0).sum()}")
print(f"Years with Net Out-Migration: {(summary['Net Migration'] < 0).sum()}")
print(f"Highest In-Migration: FY{summary.loc[summary['Net Migration'].idxmax(), 'Fiscal Year']} ({summary['Net Migration'].max():,})")
print(f"Highest Out-Migration: FY{summary.loc[summary['Net Migration'].idxmin(), 'Fiscal Year']} ({summary['Net Migration'].min():,})")
================================================================================
MARYLAND STATEWIDE MIGRATION (FY2020-FY2024)
================================================================================

  Fiscal Year  Net Migration Migration Type  Absolute Migration
        2020          -4982  Out-Migration                4982
        2021          -8725  Out-Migration                8725
        2022           1854   In-Migration                1854
        2023          11010   In-Migration               11010
        2024          34591   In-Migration               34591

================================================================================
SUMMARY STATISTICS
================================================================================
Total 5-Year Net Migration: 33,748
Average Annual Net Migration: 6,750
Years with Net In-Migration: 3
Years with Net Out-Migration: 2
Highest In-Migration: FY2024 (34,591)
Highest Out-Migration: FY2021 (-8,725)
In [17]:
# Visualize statewide migration trend
fig = go.Figure()

fig.add_trace(go.Bar(
    x=df['FiscalYear'],
    y=df['Maryland'],
    marker=dict(
        color=df['Maryland'],
        colorscale=[[0, '#d62728'], [0.5, '#cccccc'], [1, '#2ca02c']],
        line=dict(color='black', width=1)
    ),
    text=df['Maryland'],
    textposition='outside',
    texttemplate='%{text:,}',
    name='Net Migration'
))

fig.update_layout(
    title='Maryland Statewide Net Migration by Fiscal Year',
    xaxis_title='Fiscal Year',
    yaxis_title='Net Migration',
    template='plotly_white',
    height=500,
    hovermode='x unified'
)

fig.add_hline(y=0, line_dash="dash", line_color="black", opacity=0.5)
fig.show()

4. County-Level Migration Analysis¶

In [18]:
# Reshape data for county-level analysis (exclude Maryland state-level total)
county_cols = [col for col in df.columns if col not in ['FiscalYear', 'PeriodStart', 'PeriodEnd', 'Maryland']]

county_data = df.melt(
    id_vars=['FiscalYear', 'PeriodStart', 'PeriodEnd'],
    value_vars=county_cols,
    var_name='County',
    value_name='NetMigration'
)

# Calculate total migration by county over all years
total_by_county = county_data.groupby('County')['NetMigration'].sum().sort_values(ascending=False)

print("=" * 80)
print("TOTAL NET MIGRATION BY COUNTY (FY2020-FY2024)")
print("=" * 80)
print("\nTop 10 Counties (Net In-Migration):")
print(total_by_county.head(10).to_string())

print("\n\nBottom 10 Counties (Net Out-Migration):")
print(total_by_county.tail(10).to_string())

print(f"\n\nCounties with net in-migration: {(total_by_county > 0).sum()} of {len(total_by_county)}")
print(f"Counties with net out-migration: {(total_by_county < 0).sum()} of {len(total_by_county)}")
print(f"\n✓ State Total (Maryland): {df['Maryland'].sum():,} net in-migration")
================================================================================
TOTAL NET MIGRATION BY COUNTY (FY2020-FY2024)
================================================================================

Top 10 Counties (Net In-Migration):
County
Frederick     24014
Charles        6687
Carroll        4618
Harford        4354
Washington     3974
QueenAnneS     3720
Worcester      3213
Cecil          3014
Howard         2053
Wicomico       1934


Bottom 10 Counties (Net Out-Migration):
County
StMaryS           1009
Allegany          1006
Somerset           952
Kent               912
Caroline           699
Montgomery         488
Garrett            221
Baltimore        -3020
PrinceGeorgeS   -14293
BaltimoreCity   -17436


Counties with net in-migration: 21 of 24
Counties with net out-migration: 3 of 24

✓ State Total (Maryland): 33,748 net in-migration
In [19]:
# Visualize top gainers and losers among counties
top_n = 10
top_gainers = total_by_county.head(top_n)
top_losers = total_by_county.tail(top_n).sort_values()

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Top 10 Jurisdictions: Net In-Migration', 'Top 10 Jurisdictions: Net Out-Migration'),
    horizontal_spacing=0.15
)

# Top gainers
fig.add_trace(
    go.Bar(
        y=top_gainers.index,
        x=top_gainers.values,
        orientation='h',
        marker=dict(color='#2ca02c'),
        text=top_gainers.values,
        texttemplate='%{text:,}',
        textposition='outside',
        name='In-Migration'
    ),
    row=1, col=1
)

# Top losers
fig.add_trace(
    go.Bar(
        y=top_losers.index,
        x=top_losers.values,
        orientation='h',
        marker=dict(color='#d62728'),
        text=top_losers.values,
        texttemplate='%{text:,}',
        textposition='outside',
        name='Out-Migration'
    ),
    row=1, col=2
)

fig.update_layout(
    title_text='5-Year Total Net Migration by County (FY2020-FY2024)',
    showlegend=False,
    height=600,
    template='plotly_white'
)

fig.update_xaxes(title_text='Net Migration', row=1, col=1)
fig.update_xaxes(title_text='Net Migration', row=1, col=2)

fig.show()

5. Temporal Trends by County¶

In [20]:
# Select top 8 counties by total absolute migration for trend analysis
top_counties = total_by_county.abs().nlargest(8).index.tolist()

# Filter county data for selected counties
trend_data = county_data[county_data['County'].isin(top_counties)]

# Create line chart
fig = go.Figure()

for county in top_counties:
    county_trend = trend_data[trend_data['County'] == county]
    fig.add_trace(go.Scatter(
        x=county_trend['FiscalYear'],
        y=county_trend['NetMigration'],
        mode='lines+markers',
        name=county,
        line=dict(width=2),
        marker=dict(size=8)
    ))

fig.update_layout(
    title='Migration Trends: Top 8 Counties by Migration Volume',
    xaxis_title='Fiscal Year',
    yaxis_title='Net Migration',
    template='plotly_white',
    height=600,
    hovermode='x unified',
    legend=dict(
        orientation="v",
        yanchor="top",
        y=1,
        xanchor="left",
        x=1.02
    )
)

fig.add_hline(y=0, line_dash="dash", line_color="black", opacity=0.3)
fig.show()

6. Year-over-Year Change Analysis¶

In [21]:
# Calculate year-over-year changes for Maryland statewide
df_sorted = df.sort_values('FiscalYear')
df_sorted['YoY_Change'] = df_sorted['Maryland'].diff()
df_sorted['YoY_Change_Pct'] = df_sorted['Maryland'].pct_change() * 100

print("=" * 80)
print("MARYLAND STATEWIDE: YEAR-OVER-YEAR MIGRATION CHANGES")
print("=" * 80)
print("\n", df_sorted[['FiscalYear', 'Maryland', 'YoY_Change', 'YoY_Change_Pct']].to_string(index=False))

print("\n" + "=" * 80)
print("KEY OBSERVATIONS")
print("=" * 80)
print(f"Largest YoY increase: FY{df_sorted.loc[df_sorted['YoY_Change'].idxmax(), 'FiscalYear']} "
      f"(+{df_sorted['YoY_Change'].max():,.0f})")
print(f"Largest YoY decrease: FY{df_sorted.loc[df_sorted['YoY_Change'].idxmin(), 'FiscalYear']} "
      f"({df_sorted['YoY_Change'].min():,.0f})")
================================================================================
MARYLAND STATEWIDE: YEAR-OVER-YEAR MIGRATION CHANGES
================================================================================

  FiscalYear  Maryland  YoY_Change  YoY_Change_Pct
       2020     -4982         NaN             NaN
       2021     -8725     -3743.0       75.130470
       2022      1854     10579.0     -121.249284
       2023     11010      9156.0      493.851133
       2024     34591     23581.0      214.178020

================================================================================
KEY OBSERVATIONS
================================================================================
Largest YoY increase: FY2024 (+23,581)
Largest YoY decrease: FY2021 (-3,743)

7. County Migration Volatility Analysis¶

In [22]:
# Calculate standard deviation (volatility) for each county
volatility = county_data.groupby('County')['NetMigration'].agg([
    ('Mean', 'mean'),
    ('StdDev', 'std'),
    ('Min', 'min'),
    ('Max', 'max'),
    ('Range', lambda x: x.max() - x.min())
]).sort_values('StdDev', ascending=False)

print("=" * 80)
print("MIGRATION VOLATILITY BY COUNTY")
print("=" * 80)
print("\nTop 10 Most Volatile Counties (by Standard Deviation):")
print(volatility.head(10).to_string())

print("\n\nTop 10 Most Stable Counties (lowest volatility):")
print(volatility.tail(10).to_string())
================================================================================
MIGRATION VOLATILITY BY COUNTY
================================================================================

Top 10 Most Volatile Counties (by Standard Deviation):
                 Mean       StdDev    Min   Max  Range
County                                                
PrinceGeorgeS -2858.6  7032.198184 -11763  7059  18822
Montgomery       97.6  6414.480595  -8998  8401  17399
BaltimoreCity -3487.2  2620.779121  -6001   237   6238
Frederick      4802.8  2473.992259    913  7118   6205
Baltimore      -604.0  2345.949168  -2605  3376   5981
Charles        1337.4   789.545312     19  2080   2061
Howard          410.6   703.540546   -268  1267   1535
Harford         870.8   672.133692    223  2008   1785
AnneArundel     254.6   663.814959   -833   784   1617
Carroll         923.6   641.007254     79  1527   1448


Top 10 Most Stable Counties (lowest volatility):
             Mean      StdDev  Min   Max  Range
County                                         
QueenAnneS  744.0  350.328417  127  1003    876
Wicomico    386.8  309.748608  -89   741    830
StMaryS     201.8  260.834622  -57   630    687
Allegany    201.2  205.396446  -58   500    558
Talbot      331.2  177.650500   36   501    465
Somerset    190.4  165.863800  -28   406    434
Dorchester  207.6  152.470653   10   418    408
Kent        182.4  121.820770  -13   311    324
Caroline    139.8  117.918192  -22   293    315
Garrett      44.2   59.512184  -18   131    149

8. Heatmap: Migration Patterns Across Time and Geography¶

In [23]:
# Create pivot table for heatmap (counties only)
heatmap_data = county_data.pivot(index='County', columns='FiscalYear', values='NetMigration')
heatmap_data = heatmap_data.sort_values(by=2024, ascending=False)

# Create heatmap
fig = go.Figure(data=go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns,
    y=heatmap_data.index,
    colorscale='RdYlGn',
    zmid=0,
    text=heatmap_data.values,
    texttemplate='%{text:,}',
    textfont={"size": 9},
    colorbar=dict(title="Net Migration")
))

fig.update_layout(
    title='Migration Heatmap: All Counties by Fiscal Year',
    xaxis_title='Fiscal Year',
    yaxis_title='County',
    height=900,
    template='plotly_white'
)

fig.show()

9. Key Findings and Summary¶

In [24]:
# Generate comprehensive summary
print("=" * 80)
print("MARYLAND MIGRATION ANALYSIS: KEY FINDINGS (FY2020-FY2024)")
print("=" * 80)

# Statewide findings
total_migration = df['Maryland'].sum()
avg_migration = df['Maryland'].mean()
print(f"\n📊 MARYLAND STATEWIDE MIGRATION (State-Level Total)")
print(f"   • Total 5-Year Net Migration: {total_migration:,}")
print(f"   • Average Annual Migration: {avg_migration:,.0f}")
print(f"   • Trend: Shifted from out-migration (FY2020-2021) to strong in-migration (FY2022-2024)")
print(f"   • Peak Year: FY2024 with {df['Maryland'].max():,} net in-migrants")

# Top gainers among counties
print(f"\n🔼 TOP COUNTIES (Net In-Migration)")
top_3 = total_by_county.head(3)
for i, (county, value) in enumerate(top_3.items(), 1):
    print(f"   {i}. {county}: {value:,}")

# Top losers among counties
print(f"\n🔽 TOP COUNTIES (Net Out-Migration)")
bottom_3 = total_by_county.tail(3).sort_values()
for i, (county, value) in enumerate(bottom_3.items(), 1):
    print(f"   {i}. {county}: {value:,}")

# Volatility insights among counties
most_volatile = volatility.index[0]
most_stable = volatility.index[-1]
print(f"\n📈 COUNTY VOLATILITY INSIGHTS")
print(f"   • Most Volatile County: {most_volatile} (StdDev: {volatility.loc[most_volatile, 'StdDev']:.0f})")
print(f"   • Most Stable County: {most_stable} (StdDev: {volatility.loc[most_stable, 'StdDev']:.0f})")

# Distribution summary among counties
gainers = (total_by_county > 0).sum()
losers = (total_by_county < 0).sum()
total_counties = len(total_by_county)
print(f"\n🎯 COUNTY DISTRIBUTION")
print(f"   • Counties with net in-migration: {gainers} of {total_counties} ({gainers/total_counties*100:.1f}%)")
print(f"   • Counties with net out-migration: {losers} of {total_counties} ({losers/total_counties*100:.1f}%)")

print("\n" + "=" * 80)
print("CONCLUSION")
print("=" * 80)
print(f"""
Maryland experienced a significant migration reversal during FY2020-2024:

STATE-LEVEL PATTERN:
• Early Period (FY2020-2021): Net out-migration totaling -13,707
• Recovery Period (FY2022-2024): Strong rebound with +47,455 net in-migration
• Overall 5-Year Trend: Net gain of {total_migration:,} residents

COUNTY-LEVEL PATTERNS:
• {gainers} of {total_counties} counties experienced net in-migration
• Frederick County led all counties with +{total_by_county.iloc[0]:,} net in-migration
• Baltimore City had largest out-migration at {total_by_county.iloc[-1]:,}
• Most counties showed recovery patterns similar to state-level trends

NOTABLE INSIGHTS:
• FY2024 showed strongest in-migration year ({df['Maryland'].max():,})
• COVID-19 impact visible in FY2020-2021 out-migration
• Suburban/exurban counties (Frederick, Charles, Carroll) showed strongest gains
• Urban core jurisdictions (Baltimore City, Prince George's) experienced losses
""")
================================================================================
MARYLAND MIGRATION ANALYSIS: KEY FINDINGS (FY2020-FY2024)
================================================================================

📊 MARYLAND STATEWIDE MIGRATION (State-Level Total)
   • Total 5-Year Net Migration: 33,748
   • Average Annual Migration: 6,750
   • Trend: Shifted from out-migration (FY2020-2021) to strong in-migration (FY2022-2024)
   • Peak Year: FY2024 with 34,591 net in-migrants

🔼 TOP COUNTIES (Net In-Migration)
   1. Frederick: 24,014
   2. Charles: 6,687
   3. Carroll: 4,618

🔽 TOP COUNTIES (Net Out-Migration)
   1. BaltimoreCity: -17,436
   2. PrinceGeorgeS: -14,293
   3. Baltimore: -3,020

📈 COUNTY VOLATILITY INSIGHTS
   • Most Volatile County: PrinceGeorgeS (StdDev: 7032)
   • Most Stable County: Garrett (StdDev: 60)

🎯 COUNTY DISTRIBUTION
   • Counties with net in-migration: 21 of 24 (87.5%)
   • Counties with net out-migration: 3 of 24 (12.5%)

================================================================================
CONCLUSION
================================================================================

Maryland experienced a significant migration reversal during FY2020-2024:

STATE-LEVEL PATTERN:
• Early Period (FY2020-2021): Net out-migration totaling -13,707
• Recovery Period (FY2022-2024): Strong rebound with +47,455 net in-migration
• Overall 5-Year Trend: Net gain of 33,748 residents

COUNTY-LEVEL PATTERNS:
• 21 of 24 counties experienced net in-migration
• Frederick County led all counties with +24,014 net in-migration
• Baltimore City had largest out-migration at -17,436
• Most counties showed recovery patterns similar to state-level trends

NOTABLE INSIGHTS:
• FY2024 showed strongest in-migration year (34,591)
• COVID-19 impact visible in FY2020-2021 out-migration
• Suburban/exurban counties (Frederick, Charles, Carroll) showed strongest gains
• Urban core jurisdictions (Baltimore City, Prince George's) experienced losses